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Abstract 

Background: The class of small non-coding RNA molecules (sRNA) regulates gene expression by different 
mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or 
infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or 
fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like 
sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the 
authors' knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. 

Description: In order to understand sRNA's functional and phylogenetic relationships we have developed sRNAdb 
and provide tools for data analysis and visualization. The data compiled in our database is assembled from 
experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci 
surrounding the sRNAs of interest. To accomplish this, we use a client-server based approach. Offline versions of 
the database including analyses and visualization tools can easily be installed locally on the user's computer. This 
feature facilitates customized local addition of unpublished sRNA candidates and related information such as 
promoters or terminators using tab-delimited files. 

Conclusion: sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available 
sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate 
complex user specific bioinformatics analyses. 



Background 

In recent years numerous small non-coding RNAs 
(sRNAs) were discovered in bacteria. This class of RNAs 
is crucial to prokaryotic life, modulating transcription or 
translation leading to either activation or repression of 
important physiological processes. sRNAs enable bac- 
teria to trigger rapid physiological responses in order to 
adapt to the environment or infectious processes [1-3]. 

To cope with the increasing number of identified 
sRNAs, databases such as fRNAdb, Rfam, sRNAMap 
and sRNATarBase were developed [4-9]. All of these 
approaches have certain drawbacks. fRNAdb contains all 
classes of RNAs, but allows no further analysis. Rfam is 
one of the most informative data collections, allowing 
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detailed analyses via a web front-end. sRNAMap is a 
webserver-based application for gram-negative bacteria 
only. sRNATarBase compiles experimental data and 
allows the prediction of sRNA targets. But all databases 
available to date limit the analysis to published data only. 
Therefore bioinformatics analyses of candidate sRNAs in 
combination with genomes, terminators and other rele- 
vant information that has not yet been published is still 
a very complicated task. 

In an attempt to overcome some of the aforemen- 
tioned drawbacks, we have developed sRNAdb. Our 
database is a locally installable web-suite, permitting the 
comparative analysis of sRNAs of gram-positive bacteria 
including their flanking genes. User modified files in 
GenBank format and gram-negative bacterial genomes, 
pooled sRNA candidates or further features of interest 
can be included in locally installed databases. Further- 
more all integrated analysis tools can also be used 
locally. 
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Construction and content 

A database scheme of unique keys and entities, com- 
bined with corresponding relations and connections is 
given in Figure 1. Optional user defined extensions to lo- 
cally installed versions of the database are indicated with 
a lighter background color than the boxes representing 
database entities. 

Input data 

To the best authors' knowledge, no general nomencla- 
ture convention for sRNAs exists to date. Therefore 
sRNAs imported into our database from the literature 
cannot always be unambiguously distinguished by name, 
locus or annotation only. Furthermore a large number of 
published sRNAs is currently annotated as predicted or 
putative. This leads to a myriad of sRNAs bearing indis- 
tinct names, positions or ambiguous annotations. To 
cope with this difficulty, sRNAdb contains a unique key 
composed of information about the authors, experimen- 
tal conditions and sRNA properties as shown in the 



table termed snrax of Figure 1. Annotated sequences of 
organisms or plasmids downloaded from NCBIs RefSeq 
database [10] represent the replicons in the database. In- 
formation annotated in GenBank-formatted files such as 
sequences, or genes filtered from these files are automat- 
ically inserted into sRNAdb. When sRNAdb is installed 
locally, users can furthermore modify the local database 
by adding customized features such as terminators, pro- 
moters and other additional data. Terminators predicted 
by TransTermHP [11] serve as examples for this option, 
as described on the official sRNAdb server homepage. 

Architecture and design 

Our public sRNAdb server is implemented in Java 1.6 
on a Debian Linux PC. It facilitates a client-server archi- 
tecture using Java Server Pages (JSPs), Java Servlets, and 
Cascading Stylesheets (CSS). Apache Tomcat and 
MySQL serve as webserver and database, respectively. 

Related sRNAs are determined using BLASTN [12], 
while protein homologies are established by a combination 
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Figure 1 Database schema. The whole database with connections between tables and specific attributes are shown in UML-Notation. Unique 
and foreign keys of each table are given in bold letters while relations between entities are stated above the connection arrows. Optional 
features which can be inserted by the user into local versions of the database, are indicated using a lighter background color than employed for 
boxes representing entities. 
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of BLASTCLUST and BLASTP [12]. The addition of new 
data (replicons, sRNAs, terminators, promoters, RBS, etc.) 
to a local installation of sRNAdb is a simple process based 
on GenBank and tab-delimited flat-files. 

Currently, the public sRNAdb server contains 558 
gram-positive genomes and plasmids as well as 9993 
automatically predicted and 671 experimentally verified 
sRNAs. An overview is given in Table 1. 

Utility and discussion 

The sRNAdb web-database aims to collect all published 
and predicted sRNAs of gram-positive bacteria for com- 
parative analysis. sRNAs featuring an environmental 
condition-depending range of sizes can optionally be 
joined to a combined transcript. The public version of 
sRNAdb contains terminators predicted by Trans - 
termHP [11]. Three web-interfaces are provided for re- 
trieval and analysis of the data. The first module is 
called search and offers a rich query interface for the 
database, as shown in Figure 2A. Properties of sRNAs 
can be selected and filters can be defined to create task- 
specific queries resulting in a tabular output (Figure 2B). 
Related or customized data can also be collated to the 
query, based on the up- or downstream distance to an 
sRNA of interest. Furthermore, a secondary structure 
prediction of selected sRNA sequences by energy 
minimization can be performed using RNAfold (http:// 
rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). 

Another interface named blast (Figure 3A) was created 
to enable homology searches of sRNAs versus either 



public or proprietary sRNAs or whole chromosomes/plas- 
mids using BLASTN [12]. This can be used for initial 
screening of potential genomic regions. Concise matrix 
outputs for comparative analysis purposes as shown in 
Figure 3B and Figure 3C, are implemented. Complete 
BLAST alignments are displayed in Figure 3D. Sequences 
from the BLAST output table can be easily selected by set- 
ting checkmarks to extract data into a multifasta- 
formatted file, ready to serve as input to multiple sequence 
alignment programs such as CLUSTALW (http: //www. 
ebi.ac.uk/Tools/msa/clustalw2/). The resulting output can 
be used to predict structurally conserved and thermo- 
dynamically stable RNA secondary structures using e.g., 
RNAz (http://rna.tbi.univie.ac.at/cgi-bin/RNAz.cgi), facili- 
tating screens for sRNA-homologs across genomes. 

For comprehensive visual assessment the vision servlet 
(Figure 4A) was developed. This allows for a compara- 
tive analysis of multiple, related chromosome/plasmid 
loci of the genomic neighborhood of a single sRNA of 
interest (single mode) as displayed in Figure 4B. The 
results are translated into an image (.png-formatted) 
whereby homologous genes (CDS, RNA) of the sRNA 
locus are identified by BLASTP [12] and presented with 
an identical colour code. Terminators and any number 
of additional features previously defined can be included 
as desired. Each object in the image is associated with a 
popup-box, displaying further information and linked to 
corresponding database entries. The width of the result- 
ing image can be varied to compensate for different 
screen resolutions. Thus one sRNA locus can be 



Table 1 The table shows an overview of the current database entries. These are compiled from experiments or from 



bioinformatic analyses 
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PubmecLid 
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28 
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3 
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22 
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Rasmussen et al. 2009 [25] 


84 
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12 
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Toledo-Arana et al. 2009 [27] 
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Listeria monocytogenes EGD-e 
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63 
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21521948 



The organisms for which sRNAs are listed in the database, including references, the number of identified sRNAs for the specific organisms and their relevant 
pumed identification number are listed. 
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Figure 2 Search servlet. Properties of interest for each sRNA such as name, start, stop and so forth can be selected by setting check marks in 
the properties section of the servlet form. sRNAs of specific organisms or publications can be selected according to settings defined in the set 
limits section. Furthermore advanced limits for detailed filtering are available. Additional features like promoters and terminators can be searched 
for in the neighborhood of sRNAs of interest. B An example output from the search servlet. The resulting table contains four sRNAs named LhrA, 
LhrB, LhrC and L13. The corresponding search options are shown in A. For each sRNA, properties as well as additional features (promoters) in the 
surrounding area are displayed in intervals of 20 bp. Also the properties as selected with the search servlet are included in the output. 
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Figure 3 Blast servlet form and corresponding output. A FASTA formatted sRNA sequences can be inserted into the query box. Also target 
genomes or sRNAs have to be selected for multiple alignment using BLAST. For a detailed BLAST analysis the BLAST output analysis (BOA) 
options has to be selected. In this example four sRNAs resulting from a search with parameters shown in Figure 1 were selected as input. 
Genomes of the genus Listeria were set as targets and the BOA options were enabled. B The number of sRNAs detected in the target organism is 
displayed in a comparative matrix form. C All hits listed in a table and are linked to their corresponding alignment. D A detailed BLAST alignment 
of all results can also be plotted. 
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Figure 4 Vision servlet forms and result of single and batch mode. Different input options are available. After selecting the sRNA of interest, 
replicons can be selected for visualization. Options for further analyses based on BLAST, as well as properties relating to the image output can be 
set. A An example relating to the LhrC transcript is displayed. B Single mode: the resulting image shows a comparative representation of a single 
sRNA candidate and flanking genes in selected organisms. Moving the mouse pointer over these, the corresponding properties of each object is 
shown in a separate popup window. C Batch mode: sRNAs displayed in Figure 1 are used as input in this example. The output-matrix indicates 
occurrence of the sRNA candidates in selected organisms and their directional relationships with respect to their surrounding genes. 



Pischimarov et al. BMC Genomics 2012, 13:384 
http://www.biomedcentral.com/1471 -21 64/1 3/384 



Page 7 of 8 



compared to different chromosomes/plasmids in a con- 
cise image output. 

For the genome wide analysis of multiple sRNA loci 
an additional batch mode is available. Results from an 
application of this batch mode have already been pub- 
lished by Mraheil and collaborators [22]. In order to per- 
mit this global analysis an option was implemented that 
enables export of the data to an Excel sheet. This con- 
tains a visualization matrix (Figure 4C) which indicates 
the occurrence of the sRNA of interest in the target or- 
ganism together with its directional relationships of the 
flanking genes. 

The software tool presented here is a valuable exten- 
sion to existing solutions and will assist in the rapid ana- 
lysis of large volumes of data to understand the 
distribution and evolution of sRNAs in bacteria. Com- 
pared to other databases the comparative batch mode of 
sRNAdbs vision servlet facilitates analyses such as in 
silico screening for phylogenetic markers, or identifica- 
tion of drug targets related to bacterial sRNAs. As exem- 
plified by Mraheil and colleagues [22] a grouping of 
sRNAs from pathogenic, apathogenic or non-pathogenic 
bacterial strains based on the vision servlet 's result 
matrix, allows the user to identify sRNAs as putative 
phylogenetic markers. Specifically, sRNAs found exclu- 
sively in pathogenic strains can be identified as drug tar- 
get candidates. Furthermore after download and local 
installation of sRNAdb, both the database and the dedi- 
cated software tools are available to the user. Since pro- 
prietary replicons or putative sRNAs can easily be 
included into locally installed versions of the database, 
these may be analysed making use of the full power of 
sRNAdbs software tools, simplifying detailed analyses of 
unpublished bacterial replicons or sRNA candidates. To 
the best of the author s knowledge, this functionality is 
currently not supported by any other publicly available 
sRNA database. 

Conclusion 

sRNAdb offers biologists an easy access and analysis to 
both proprietary and public data and allows the identifi- 
cation of a core set of sRNAs which can be used as puta- 
tive drug targets in antimicrobial therapeutic approaches 
as well as specific sRNAs for potential diagnostic mar- 
kers for the detection of gram-positive bacteria. 
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